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ABSTRACT 

New plots are proposed based on minimum and maximum order statistics that is visually appealing, easy to 
understand, stable at extreme tails and capture all information about the distribution of the data. The minimum and 
maximum plots give more weights to the data at the extreme tails unlike quantile quantile plot. Therefore, it can be 
considered these plots as a completeness of the quantile quantile plot. The minimum and maximum plots are used to obtain 
a nonparametric visualization for the Gumbel and Weibull distributions. Moreover, the minimum and maximum normal 
plots are introduced and compared with quantile quantile plot. The new plots have advantage to be applied to discrete 
distributions. 
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1. INTRODUCTION 

Graphical presentation of data is an important tool in sciences. Good graph reflects a great deal of information and 
can be used to extract new conclusions while bad graph can be misleading and confusing. Given a random sample of 
univariate data points, a pertinent question is whether this sample comes from some specified distribution F. Decision 
techniques are based on how close the empirical distribution of the sample and the distribution F are for some sample size 
n. 

Quantile-quantile (Q-Q) plot is commonly used device to graphically and informally test the goodness-of-fit of a 
sample in an exploratory way. It is used to plot the sample quantiles against the theoretical quantiles or other sample 
quantiles and then a visual check is made to see whether or not the points are close to a straight line; see, Chambers et al 
(1983), Cleveland (1994), Scott (1992) and Cleveland and McGill (1988). The pattern of points in the plot is used to 
compare the shapes of distributions, providing a graphical view of how properties such as location, scale and skewness. 
The use of Q-Q plots to compare two samples of data can be viewed as a non parametric approach to comparing their 
underlying distributions. A Q-Q plot is generally a more powerful approach to do this than the common technique of 
comparing histogram of the two samples, but requires more skill to interpret; see, Makkonen (2008) and Wilk and 
Gnanadesikan (1968). 

Extreme order statistics plots are proposed based on minimum and maximum order statistics from population of 
size k (Min-Max plots). The plots can be done in parametric and nonparametric ways. The Min-Max plots give more 
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weights to the data at the extreme tails of the distribution. Therefore, these plots will complete the picture of the data with 
QQ plot especially at the extreme tails of the distribution. Min-Max plots are used to obtain nonparametric characterization 
for the Gumbel and Weibull distributions. Since a variety of estimation and inferential procedures in the practice depends 
on the assumption of normality, the Min-normal plot and Max-normal plot are introduced and compared with Q-Q plot. 
These plots characterize and capture all information about the whole distribution of the data. The pattern of the points in 
the Min-Max plots is used to compare the shapes of distributions non-parametrically. Min-Max plots are used to plot the 
data against theoretical extreme order statistics or sample extreme order statistics and then a visual check is made to see 
whether or not the points are close to a straight line but the Min-Max plots have more stability at the tails of the 
distribution than Q-Q plot. 

The extreme order statistics plots and their characterization to probability distributions are derived in Section 2. 
The Min and Max normal plots are introduced in Section 3. The nonparametric visualization for Gumbel and Weibull 
distributions is proposed in Section 4. An extension of Min and Max plots to discrete distributions are introduced in section 
5. Two applications are studied in Section 6. Section 7 is devoted for conclusion. 

2. EXTREME ORDER STATISTICS PLOTS 
2.1. Extreme Order Statistics 

Let X lt ...,X n be a sample from a distribution function F, probability function /(x) and quantile function x(F). 
When the s are arranged in ascending order of magnitude and then written as 

X r . n is the rth order statistic. Since the event ( X rm < x) occurs if and only if at least r of the X t ’s are less than or 
equal to x, F r:n is expressible in terms of F as the binomial tail probability 

n 

F rm (x) = Pr(X rm < x) = ^ Qf"(*) [1 - FIX)]"-;' 

}-T 

The expected value of order statistics is 
E(X r J = r (") [ x(F)F r ~ 1 (1 - Fy- r dF, r<n 

This can be re-written as 
E(X r .J = r (”) F[*(F)F r - 1 (l - F) 1 "] 


See; David (1981) 

Let 

M mn = max{* lf 

Denote the maximum of the first n random variables. Its distribution function is given by 
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= [F(x)] n ,-m <x < CO 

As pointed out by Arnold et al. (2008), clearly knowledge of the distribution of X nm determines F(x) completely. 
This is true since 

rO) = [F n J 1/n ,-co < X < CO 

Moreover, Chan (1967) has shown that if E\X\ < oo then F(x) is uniquely determined by the sequence 

{E(X n J :n = 1,2,3, -} 

Let 

Mi tn = mm{X lf ..., X n } 

Denote the minimum of the first n random variables. The distribution function is given by 
F lm = 1 - [1 - F(x)] n , — oo < x < oo 

Clearly knowledge of the distribution of X lm determines F completely. This is true since 
F(x) = 1- [l-F lin ] 1/m f -co < x < oo 

Also, Chan (1967) has shown that if F|A| < oo then F is uniquely determined by the sequence 

{E(_X ln ):n = 1 , 2 , 3 , -} 

For example, 

— 1/^r ^ 3 , 

if and only if F is unit exponential (F(x) = 1 — exp (— x), x > 0), 

E(X n]n ) = 2nf(2n + 1 ),n = 1,2,3, - 
if and only if F is triangular (F(x) = x 2 ,0<x<l) and 
E(Xi:n) = 1/(2^ - 1), n = 1,2,3, ... 

if and only if F is geometric(F(A = x) = 2 -x_1 ,x = 0,1,2, ...); see, for example, Huang (1989). 

2.2. Min and Max Plots 

For a given data of size n, x lt x 2 , ... . , x n , the theoretical min curve based on the expected value of order statistics 
is defined as 

E{X lk ) = kE [x (F) ( 1 — F) k ~ 1 ] , k = 1,2, r n 

From Downton (1966) and Elamir and Seheult (2003) this can be estimated as 
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\k) i= 1 


im ,k= 1,2, 


The theoretical max curve based on the expected value of order statistics is defined as 

F(X fc!f; ) = = 1,2 

From Downton (1966) and Elamir and Seheult (2004) this can be estimated as 


E( x k'.k) = _ lf2 n 

\k ) «=i 


Nonparametric extreme order statistics plot consists of two plots 
Min curve := or x im versus k = 1, i = 1,2, ... ,ii 

This curve starts from the average x = E(X 1:1 ) to the minimum value x Vn = E(X 1:n ). 

Also the max curve is plotted as 

Max curve := or x im versus ^(A^)'] , k = 1, i = 1,2, ...,n 

This curve starts from the average x = E(X 1:1 ) to the maximum value x mn = E(X mn ). 

Both curves should tell us the whole picture about the distribution function of a random variable X for a given data. Also 
each curve in its own should reflect all the information about the whole distribution for a random variable X for a given 
data. 

Extreme order statistics plots can compare theoretical distribution with any data using 
Min plot: = ] [^(X 1:k ) versus E(X llk )j , k = 1, 

and 


Max plot := ^ZF(JF fe , fe ) versus E(X k:k )J, k = 1, 

Also if two data come from the same distribution, the full nonparametric plot is 

n line plot := versus raj 


Min l 


and 
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Max line plot := (J*i?(A T l!k ) versus k = 1, 


In all these cases the Min and Max plots should show relationship close to straight line. 

3. MINI AND MAX NORMAL PLOTS 

Since a variety of estimation and inferential procedures in the practice depends on the assumption of normality, 
the graphical characterization of the normal distribution is very important and the most common graph is quantile quantile 
normal plot. The Min and Max normal plots will complete the picture of QQ-norm plot especially at the extreme tails of 
the distribution. The Min-norm plot is proposed by plotting the exact minimum order statistics of size k from standard 
normal distribution that can be obtained from package EnvStats in R software versus estimated minimum order statistics 
from a data as 

Min normal plot:= (evNormOrdStatsScalar(l, k ) versus E(X vk ), k = 1,2 , ... , n ) 

Also the maximum plot is proposed as 

Max normal plot := (evN ormOrd Stats S cala r (fc, fc) versus E(X klk ) f k = 1,2. , n) 


The pattern of points in the Min-normal and Max-normal plots must show straight line or close to straight line. 


Figure 1 shows Min-normal, Max-normal and normal Q-Q plots for simulated data from normal distribution (500,20) and 
n = 200. It is clear that the Min-normal gives more weights to lower tail of the distribution while the Max-normal gives 
more weights to upper tail of the distribution. Note also that the Min and Max normal plots are more stable than QQ- 
normal at the extreme tails of the distribution. 


Min-normal plot 


Max-normal plot 


Normal Q-Q Plot 



Figure 1: Min Normal, Max Normal and Q-Q Normal Plots for Simulated Data from Normal 

Distribution (500, 20) ANDn = 200. 


Moreover, the location and scale parameters can be estimated from Min and Max normal plots. The mean of the 
population can be estimated from the largest value in Min plot and the lowest value in Max plot, i.e., /2 « 501. The Gini’s 
measure (G) of variability can be estimated from the plot by using the highest two points in Min plot and lowest two points 
in Max plot where 
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G = E(Y 2 , 2 - Y li2 ) = 2E(Y 1:1 - Y lt2 ) = 2 E(Y 2t2 - Y ltl ) 


The estimated Gini’s measure is 2(501-491)=20 and 2(511-501)=20; see, Elamir (2013). 

Figure 2 shows Min, Max and Q-Q normal plots for simulated data from Laplace distribution (500,20) and 
n = 200. Note that the curvature is clear in the Min plot normal. Note that R-program for Min-normal and Max-normal 
plots is given in Appendix A. 



TheoreBcal Minima 


Theo-relica] Maxima 


Theoretical Quantiles 


Figure 2: Min, Max and Q-Q Normal Plots for Simulated Data from Laplace 
Distribution (500, 20) And n = 200. 


4. NONPARAMETRIC VISUALIZATION 


The extreme order statistic plots can be used for nonparametric visualization for Gumbel and Weibull distributions 
as follows. 

4.1. Gumbel Distribution 

This distribution is used to model the distribution of the maximum or the minimum of a number of samples of 
various distributions. It is useful in predicting the chance that an extreme earthquake, flood or other natural disaster will 
occur; see, Gumbel (1954). Consider the density function for Gumbel distribution is given as 


f(x; a,ff) = f} 1 exp[— (x — a)/0\ exp[— exp[— (x — — oo < x < co 


and the cumulative distribution function is 


Hx) = exp [-exp[-(x -a)//?]] 

From Arnold et al. (2008) the maximum order statistics can be obtained as 


E(X klk ) = a + 0,5772/? + /?lo gk 
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Completely nonparametric visualization for Gumbel distribution can be done as 
Max plot: = (logfc, E[X k , k ) ) r k = 1,2 ,...,n 
Also the quantile function is 
x(F) — a — /?log(— logF) 


The quantile plot is 

quantile plot := (log(— logF),^),! = 1,2, ...,n 


histogram for Gumbel 



0 10 20 30 40 

data 



Figure 3: Histogram, Quantile and Max Plots for Simulated Data from Gumbel (10, 5) 

Distribution and n = 200 

It is clear from Figure 3 the Max plot has direct straight line and the quantile plot has inverse straight line. This is 
a very strong indication for Gumbel distribution. Moreover, the slopes for two plots are -4.934 and 5.025 and the intercepts 
are 9.889 and 12.948, respectively. 

4.2. Weibull Distribution 

The Weibull distribution is used in many areas such as survival analysis, reliability engineering, weather 
forecasting and wind speed analysis; see, Johnson et al. (1994). Consider the density function for Weibull distribution is 
given as 


f(x;A r S) = - (-J e (jc/J0 ,x > 0, S, A > 0 

The cumulative distribution function is known to be 

F(x) = 1 - 
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The minimum and maximum order statistics can be obtained from Arnold et al. (2008) as 

= Ar(i + i/5)fe“ 1/s 

and 

Jr— 1 

= at(i+ 7 ^(1 + i)' 1_1/5 

J-£> 

Completely nonparametric visualization for Weibull distribution may be obtained by taking the logarithm of 
E (X 1;k ) as 

1 

log EQX^) = logA + logr(l+ t/S) — -log k,k = 1,2, ...,n 

U 

Therefore, 

log Min plot := (log k , log E(X 1;k ) ) 

This indicates that the Weibull distribution with density f(x; A, S) =j (j) e -(x/X) s can characterized by the 

inverse linear relationship between the logarithm of minimum order statistics and the logarithm of the ranks whatsoever the 
values of the parameters A and S. Also, this plot characterizes the exponential distribution for 5 = 1; i.e., the slope is 1. 

The quantile function can be obtained from cumulative function as 

log[— log(l — F)] = —Slog A + Slog x 

Therefore, log quantile plot is 

log quantile := (log x^log[— log(l — F)] , i = 1^2, ... , n) 

This is also known as Weibull plot; see, Johnson et al. (1994). This indicates that the Weibull distribution with 

density f(x-, A, 6) = j (0 e _(x//l) ' S can be characterized by the direct linear relationship between log[— log(l — F)] and 

log x t . 
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Figure 4: Histogram, Log Quantile and Log Min Plots for Simulated Data from Weibull (1, 0.5) 

Distribution and n = 200 

It is clear from Figure 4 the log min plot has inverse straight line and log quantile plot has a direct straight. This is a very 
strong indication for Weibull distribution. Moreover, the slopes for two plots are -1.85 and 0.53 and the intercepts are 0.445 
and -0.062, respectively. 

5. DISCRETE DATA 


The Min-Max plots have advantage to be applied for discrete distributions to graphically and informally test the 
goodness-of-fit of a sample in an exploratory way. 

The binomial distribution with parameters m and p is the discrete probability distribution of the number of successes in a 
sequence of m independent yes/no trials each of which yields success with probability p. The probability mass function is 

f(x,m,p) = (^) p x (l-p) m ~ x r x = 0,1 , ... , m 


and cumulative 


FW= 2(7V (i_pr_J 

J=0 

From Arnold et al. (2008) the minimum and maximum order statistics can be obtained as 

771 — 1 
X = f} 


and 
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m-1 

E(x k , k -) = Y j &-[nx)] k } 

x = f} 


Bernoulli distribution is a special case of binomial distribution at m = 1 where a random variable which takes the 
value 1 with success probability of p and the value 0 with failure probability of q = 1 — p. The minimum and maximum 
order statistics can be obtained in a simple form for Bernoulli distribution as 

ff(* l!fc ) =p k and = 1 ~q k ,k = 1,2, ...,n 

For given p, the proposed plot for Bernoulli distribution is 


Min plot := (p k versus E(X 1;k ),k = 1,2, ...,n) 


and 


Max plot := ( 1 — q k versus E(X k . k Xk = 1,2, ... ,n) 

Figure 5 shows Min and Max plots for simulated data from Bernoulli distribution (p=0.5) and n = 100 versus 
theoretical Min and Max values p k and 1 — q k . Also, Figure 6 shows Min and Max plots for simulated data from Bernoulli 
distribution (p=0.05). Both graphs show straight lines. 


Bernoulli Min plot with p=0.5 


Bernoulli Max plot with p=0.5 



Figure 5: Min and Max Plots for Simulated Data from Bernoulli Distribution (P=0.5) versus the 

Oretical 0. 5 fc and 1 - (0. 5) fc Andn = 100 
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Bernoulli Min plot with p=0.05 



Bernoulli Max plot with p=0.05 



Figure 6: Min and Max Plots for Simulated Data from Bernoulli Distribution (P=0.05) versus the 
Theoretical Versus Theoretical 0. 05 fc and 1 — (0. 95)^And n = 100 

Figure 7 shows Min and Max plots for simulated data from Bernoulli distribution (p=0.5) versus the theoretical 
0.80 k and 1 — (0.20) fe and n = 100. It is clear that the data does not come from Bernoulli distribution. 

Bernoulli Min plot Bernoulli Max plot 




Figure 7: Min and Max Plots for Simulated Data from Bernoulli Distribution (P=0.5) Versus the Theoretical 

Versus Theoretical 0. SO^and 1 — (0. 20) fc and n = 100 

The geometric distribution that is used for modeling the number of trials up to and including the first success that 
requires x number of independent trials each with success probability p is defined as 


f{x) - pq* 1 ,x = 1,2, .... 


From Margolin and Winokur (1967) the Min order statistics can be obtained as 


*(*!*) 


1 

1- q k 


and Max order statistics 
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^ c-i 

E( - Xk: ^ k Z, o + ixi - 


J =0 


Therefore, for given p the proposed Min and Max plots for geometric distribution are 

Tin plot := EtX ±tk y,k = 1, 2 , — , 


Mi 


and 


Max plot := 





6. APPLICATION 

6.1. Application 1 

An experiment was performed to determine whether two forms of iron (Fe 2+ and Fe 3+ ) are retained differently. If 
one form of iron were retained especially well, it would be the better dietary supplement. The investigators divided 36 mice 
randomly into two groups of 18 each. The mice were given iron at concentration 1.2 millimolar for both groups and later 
time count was taken for each mouse, and the percentage of iron retained was calculated; see, Rice (1995). The data are 
given in Table 1 . Are these data come from the same distribution? 


Table 1: The Percentage of Iron Retained at Concentration 1.2 Milli molar 


Y=Fe 3+ 

2.2 

2.93 

3.08 

3.49 

4.11 

4.95 

5.16 

5.54 

5.68 


6.25 

7.25 

7.90 

8.85 

11.96 

15.54 

15.89 

18.3 

18.59 

Yl= Fe 2+ 

4.04 

4.16 

4.42 

4.93 

5.49 

5.77 

5.86 

6.28 

6.97 


7.06 

7.78 

9.23 

9.34 

9.91 

13.46 

18.4 

23.89 

26.39 


Figure 8 shows the Min line, Max line and QQ plots for these data. The mean can be obtained from the graph as 
8.20 and 9.60, respectively. Also the gini’s measures are 2(11.2 — 8.2) = 6 and 2(13 — 9.6) = 6. 8., respectively. The 
plots indicate that the data are right skewed and do not come from the same distribution. 



Figure 8: Min Line, Max Line and QQ Plots for the Percentage of Iron Retained at Concentration 1.2 Milli Molar 
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6.2. Application 2: Pareto Distribution 

Pareto distribution represents one of the most famous distributions and it is widely used in economics, finance and 
natural sciences; see Johnson et al. (1994) and Haseeb et al. (2012). The density for Pareto I is defined as 

f(x;S ) = x > /? > 0, a > 0 


Where /? is the scale and a is the shape parameter and the smaller a, the fatter the right tail of the distribution. For 
a < 2 the Pareto distribution has infinite variance. For a < 1 the expected value does not exist. Figure 9 shows Min line, 
Max line and QQ plots for simulated data from Paretol (10,3) for two variables y and y x and n = 180. It is clear that the 
stability of Min and Max plots over QQ plot especially at the extreme tails. 


Min line plot 


Max line plot 


quantile quantile plot 





Sample Minima fory 


sample Maxima fory 


10 15 20 25 30 35 

Sample quantile 


Figure 9: Min Line, Max Line and QQ Plots for Simulated Data from Paretoi (10, 3) for 

Both y and y 1 and n = 180 

7. CONCLUSIONS 

Min and Max plots based on minimum and maximum order statistic of size k are proposed in nonparametric and 
parametric ways. These plots are very useful especially for heavy tailed distributions where they give more weights for the 
extreme tails. It has been shown that the Min and Max plots characterize the Gumbel and Weibull distribution non- 
parametrically using simple linear regression. 

Since the normal distribution is very important in practice, the Min-normal and Max-normal plots are introduced 
and it has been shown that they had completed the picture of the data with QQ plot especially at the extreme tails of the 
distribution. One more advantage of Min and Max plots is that they had extended to discrete distributions such as 
Bernoulli, Binomial and geometric to graphically and informally test the goodness-of-fit of a sample in an exploratory way. 

One limitation of Min and Max plots is when the extreme order statistics are not defined. But the Min and Max 
plots may still be plotted using the available information and ignoring undefined values. Of course, in this case some 
information will be lost. 
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Appendix A: R-program for Min and Max normal plots 

library(EnvStats) 
library (VGAM) 

par(mfrow=c(l,3)) ### 3 graphs in one page 
LGd=function(x,t){ ### function for estimated Min order statistics 
n=length(x); i=l:n; x=sort(x) 
cl=l/choose(n,t) 
tl=choose(n-i,t-l)*x 
cl*sum(tl)} 

LGo=function(x,t){ ### function for estimated Max order statistic 
n=length(x); i=l:n; x=sort(x) 
cl=l/choose(n,t) 
tl=choose(i-l,t-l)*x 
cl*sum(tl)} 

n=200; k=l:n; y=rnorm(n,500,20) ### simulated normal data 
wdy=0; woy=0; El 1=0; Ekk=0 
for (i in l:n){ 

wdy[i]=LGd(y,i); woy[i]=LGo(y,i) ### estimated Min and Max order stat. 
Ell[i]=evNormOrdStatsScalar(l,i) ### exact Min order statist. 
Ekk[i]=evNormOrdStatsScalar(i,i) } ### exact Max order stat 
plot(Ell,wdy,main="Min-normal plot",col="red", 
xlab="Theoretical Minima", ylab="sample Minima") ### Min normal plot 
Ml=lm(wdy~Ell); abline(Ml) ### fitting straight line 

plot(Ekk,woy,main="Max-normal plot",col="blue", 

xlab="Theoretical Maxima", ylab="sample Maxima") ### Max normal plot 
M2=lm(woy~Ekk); abline[M2) ### fitting straight line 

qqnorm(y); qqline(y) ### Q-Q normal plot 
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